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Abstract — Happiness travels quickly in comparison to sadness or disgust, but proliferation of anger and fear surpasses them 
all. This defines the bottom-line of information virality on social media. Pertinent psychological studies convey that human 
emotions may be ‘activated’ or ‘deactivated’ to drive people to take action. Based on this, we propose the use of cognitive 
behavioural features to assess the virality of information in tweets by finding a dominant emotion of same type across tweets as 
an indicator of viral spread. Fluctuations in emotions convey uncertainty and may reduce the frequency and intensity of 
discussion of a trending topic. The proposed virality prediction framework detects the emotion quotient (EQ), a measure of 
emotional intensity associated with five emotions, namely, fear, disgust, sadness, anger, and happiness for the exposed 
information in tweets to predict its outburst, i.e., virality, pertaining to social and political issues. The hybrid (lexicon + 
supervised learning) approach using parts-of-speech (adjectives, adverbs, verbs, emoticons) is proffered to transform the tweet 
into an emotional vector representative of the sentimental value for a trending topic. This emotional quantifier is then used as 
an empirical evidence to determine the likelihood of information going viral based on the strength of emotion in tweets and its 
no. of re-tweets. Preliminary results clearly demonstrate the effectiveness of the approach which affirms information virality. 


Keywords — Viral, Twitter, Emotion. 

I. Introduction 


Social media has the power to make any information, be it 
true or false, go viral and reach and affect millions. Good, 
bad, true, false, useful, useless all kinds of information 
proliferates through the social web platforms. The 
widespread activation of information propagation across 
meta-networks is referred to as the “virality”. The magnitude 
of social media virality cannot be overrated. It can bring 
fame and prosperity but at the same time can beget notoriety 
and nuisance. Social networks have been witness to the self¬ 
reinforcing Echo Chambers which steers a confirmation bias 
(false sense of affirmation that we are right in our beliefs) 
and relevance paradox (readers only consume information 
that is relevant to them, kind of one-sided).Twitter is one of 
the most popular social networks worldwide and as per the 
statistics for the first quarter of 2018, this micro-blogging 
service averaged at 336 million monthly active users globally 
[1]. The platform is used as a communication channel by 
businesses, celebrities and even government. Encouraging 
vigorous participation in such channels can be intentional or 
unintentional with the activities ranging from supporting a 
cause, getting involved, expressing personal feelings or 
beliefs, attention seeking, self-ambitions, finger-pointing 
someone, viral marketing, prank or to spread fear & hatred. 
Information virality refers to the inevitable cascading effect 
of information spread online which eventually proliferates 
across meta-networks and affects millions. In October 2017, 
the #MeToo movement created a wave of global reckoning 


for being posted by women who say they’ve faced sexual 
harassment and assault [2]. The impact of these two words 
was so much that it soared across social media including, 
Facebook and Instagram. It was one seismic activity which 
demonstrated the fortitude of social platforms and its virality. 


^0 Q Search Twitter | 

Log in 

Alyssa Milano O v 

(cOAIyssn Milano 

If you've been sexually harassed or 
assaulted write 'me too' as a reply to 
this tweet. 

Me too. 

Suggested by a friend: "If all the women who 
have been sexually harassed or assaulted 
wrote ‘Me too.’ as a status, we might give 
people a sense of the magnitude of the 
problem." 

1:61 AM Oct 16. 2017 
24.5K Retweets 53K Likes 

Figure 1. Social Media Virality and its effect 
Thus, it becomes exceedingly imperative to resolve the 
accuracy of information and promptly inhibit it from 
spreading among the Internet users as this can jeopardize the 
well-being of the citizens. Pertinent psychological studies 
convey that humans are intrinsically not very good at 
differentiating conflicting information. Naive Realism and 
Confirmation Bias further add to the vulnerability. Though 
the cascading model of tweet-re-tweet captures the virality of 
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a tweet over its lifetime, the likelihood of content going viral 
has more to do with how activated the person felt after 
reading it. Crucially, it’s just not the volume of tweets that 
matter, but the “homogeneity” and “irregularities” in the 
emotion that can make the difference. Thus the hypothesis 
laid is that “As unverified information spreads considerably 
on social media, it works with the same mechanics as that of 
a large protest where an outsized share of same emotion is 
representative of the response sensitivity. That is, emotions 
may be ‘activated’ or ‘deactivated’ to drive people to take 
action and a dominant emotion of same type across tweets is 
indicative of a viral spread. Fluctuations in emotions convey 
uncertainty and may reduce the frequency and intensity of 
discussion of a trending topic.” Based on this, we propose 
the use of cognitive behavioural features to assess the virality 
of information in tweets. The proposed technique detects the 
emotion quotient (EQ), a measure of emotional intensity 
associated with five emotions, namely, fear, disgust, sadness, 
anger, and happiness for the exposed information in tweets to 
predict its outburst, i.e., virality, pertaining to social and 
political issues. 

The approach is to transform the tweet into an emotional 
vector representative of the sentimental value for a trending 
topic. A lexicon based technique is employed to associate the 
emotional values for the words in the sentence. Parts of 
speech like adjectives, adverbs and some groups of verbs and 
nouns have been reported as good indicators of fine-grain 
sentiment across pertinent literature [3, 4, 5], In this research, 
the adjectives, the verbs and the adverbs are considered as 
the emotion carriers in the sentence for feature-level emotion 
analysis. In natural language, the adjectives help to 
express the fundamental feelings and emotions within a 
tweet. The verbs operate as polarity markers as they convey 
the tone associated with the emotion. Similarly, the adverbs 
act as emotion bolsters, which scale the emotion polarity in 
terms of strength. For example, the occurrence of adverb 
“not” in “not bad” inverts the emotion value of the next word 
whereas the occurrence of adverb “ruthlessly” amplifies the 
emotion value of the next word. The use of emoticons has 
become a mainstream culture in social content writing and 
their use cannot be ignored as they suggest adjectives which 
add tone and clarity to the communication. Basically, the 
emoticons influence emotional communication. Studies 
suggest that emoticons, when used in conjunction with a 
written message, can help to increase the “intensity” of its 
intended meaning. Thus, the emotion analysis tool works by 
assigning emotion value to each adjective and emoticon in 
the sentence and obtaining the polarity value of verbs and the 
strength of adverbs. 

In order to set the benchmark for empirical analysis with the 
created adjective emotion lexicon base, we apply classifiers. 
We analyze six supervised learning algorithms namely, 
Support Vector Machine (SVM), Decision Trees (DT), 


Logistic Regression (LR), Multi-layer Perceptron (MLP), 
Random Forest (RF), K-Nearest Neighbors (K-NN) for 
predicting the adjective emotion values for each tweet. The 
emotion quotient for each tweet is then calculated using a 
linear equation with scores from all four lexicon base. This 
patterning of emotions with time along with the number of 
times a tweet is re-tweeted measures the viral value of a 
tweet. Finally, the cumulative strength of viral values across 
all tweets is computed to detect a strong indicator of viral 
spread, i.e. virality of information. Once a tweet is identified 
as viral, tools and techniques that authenticate its source and 
veracity can be employed to mitigate any intentional and 
wrongful circulation. Further, this technique can be 
considered as a preliminary step to detect a possible rumour 
for which the actual truth value needs to be determined with 
accuracy & without delay. 

The rest of the paper is organized as follows: Section 2 
discusses the background work in this direction of virality 
prediction and specifically the use of emotions in viral posts 
on Twitter. Section 3 puts forward the details of the proposed 
framework, the Virality Prediction Framework followed by 
its implementation in section 4. Section 5 illustrates the 
results obtained and their analysis followed by the 
conclusion in section 6. 

II. Related Work 

The term ‘Virality’ is originally from the biological sciences 
where the viruses contagiously spread among organisms. But 
recently, the term has found a new technological meaning 
with its social media presence. It is more than the basic 
person-to-person broadcasting and relies on word-of-mouth. 
“Going viral” and “Viral marketing” are two buzz terms 
reigning the online marketing and economics. Primary and 
secondary studies have been reporting the virality of content 
(tweets, posts, videos, photos) on social media. 

Weng et al. [6] proposed a prediction model for information 
virality detection on Twitter using data about community 
structure. They show that, while most memes indeed spread 
like complex contagions, a few viral memes spread across 
many communities, like diseases. Using the proposed model 
the authors also demonstrate the future popularity of a meme 
by quantifying its early spreading pattern in terms of 
community concentration. Hoang et al. [7] present a virality 
model of twitter content to find viral tweets, viral users and 
viral topics. The highly viral messages, topics and users in 
GE2011 are extracted and evaluated using the model. 

Berger and Milkman [8] were the pioneers to add 
psychological approach to online content virality. The 
authors suggest the relationship between emotion and 
transmission to understand what becomes viral. Hansen et al. 
[9], study the relation between affect and virality to 
understand the psychological and sentimental arousal 
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theories. The dataset includes three corpora: tweets about the 
COP15 climate summit, random tweets, and text corpus 
including news. The findings also present evidence that 
negative sentiment enhances virality in the news. 

The work presented in this paper is based on the hypothesis 
of Berger and Milkman [8] that “Virality isn’t born but 
made”. That is, it is made by the users, for the users and to 
the users and is motivated by why users’ converse and share 
information which is psychologically & emotionally 
triggered. 


III. INFORMATION VIRALITY PREDICTION FRAMEWORK 


The intent of the work proposed in this research is to create a 
framework that will enable predicting a viral tweet by virtue 
of its public emotion strength. The following figure 2 
depicts the proposed framework. 



/ 



\ 


VnULITY 


Figure 2. Information Virality Prediction Framework 
As a typical text mining task, this framework consists of 
three modules, namely, the pre-processing module, the 
emotion indicator lexicon module and the virality scoring 
module. 


A. Pre-processing Module 

The tweets pertaining to a topic (#topic) are extracted from 
the publically available Twitter datasets using its API. In 
order to intelligently mine the text in tweets, pre-processing 
is done for cleaning and transforming the data for relevant 
feature extraction. 

• Primarily the pre-processing includes cleaning the text 
by removal of redundant tweets, all URLs, hash tags, 
@username and non-English words followed by the 
transformation of text for relevant feature extraction. 
Sometimes people may use hashtags to convey direct 
and explicit emotions, for example #sad but we have 
omitted these as our main aim to predict the strength of 
emotion and not just the emotion. 

• Text transformation firstly replaces the emoticons in 
text with their descriptive text or phrase. As the name 
suggests, emoticons are emotion icons and convey the 
emotions similar to human facial expressions. Their use 
has become a mainstream culture and so these cannot be 
omitted as they suggest adjectives which add tone and 
clarity to the communication. Emoticons influence 
emotional communication. Researchers found that 
emoticons, when used in conjunction with a written 
message, can help to increase the “intensity” of its 
intended meaning [10]. For example, the emoticon © 
will be replaced by its description “sad face” and will be 
assigned an emotion strength value of -0.5. Thus we 
replace all the emoticons with their description and 
polarity using the values presented in the table 1 below. 
The list is an updated version of our earlier attempt [3] 
to decipher and use emoticons. 


Table 1. Emoticons 


Emoticon 

Description 

Emotion 

Strength 

: D 

Big Grin 

1 

XD 

Laughing 

1 

<3 

Heart 

1 

0. =). >) 

Happy, Smile 

0.5 

•* 

Kiss 

0.5 

0:) 

Angelic 

0.5 

:l, :-l 

Straight Face, Indifferent 

0 

:\ 

Undecided 

0 

:(,=( 

Sad 

-0.5 

</3 

Broken Heart 

-0.5 

=0, :-o 

Shocked 

-0.5 

:’( 

Cry 

-1 

X-( 

Angry, Frown 

-1 

xP 

Disgusted 

-1 


It is important to make note that although the use of 
emoticons like Winking ;) and Sticking tongue out :P is 
widespread but it opens up a new avenue of research, as the 


use of these emoticons is related to a sarcastic, humourous, 
non-serious, joking tone of the post which may completely 
reverse the emotion conveyed by the textual indicators. For 
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example, a tweet “We will all be killed then...Lets meet in 
heaven ;)” is a humourous tone whereas the textual emotion 
analytics will detect this as a negative one. For the 
framework defined in this paper, we have omitted the use of 
any such emoticons and have only considered the ones 
defined in table 1. 

Next, using a POS tagger, only the adjectives, verbs and the 
adverbs are extracted to build the feature set. The emotion 
scores are then assigned to these to compute the final 
emotion quotient for the tweet. 

B. Emotion Indicators Lexicon Module 

The adjectives, verbs and adverbs are expressions 
of sentiments which convey emotions strongly. Adjective is 
that part-of-speech which describes, qualifies and identifies a 
noun or pronoun. Verbs express activity in terms of an 
action, an occurrence, or a state of being. Adverbs are words 
that change the meaning of a verb, adjective. In unison, these 
three parts-of-speech and emoticons quantify the emotion 
strength and will assist in capturing the growing emotional 
response of online users associated with a topic (an event, a 
person, a place, an issue). The lexicons for all these three 
emotion indicators are created and assigned values through a 
crowdsourcing initiative. Also, supervised learning models 
have been empiraclly analyzed for prediction of adjective 
emotion category. The details of each lexicon is explained. 

A corpus of most commonly used adjectives created and 
validated in our earlier research [4] has been used for 
creating and assigning values to emotion tuples. The sample 
emotion tuple value for few adjectives is represented in table 
2. The emotion values are assigned on a scale of 0 to 5 for 
five emotions in the vector, namely, fear, disgust, sadness, 
anger, happiness. 


Table 2. Adjective Emotion Values 


Adjective 

Happiness 

Anger 

Sad 

Fear 

Disgust 

damaging 

1.33 

3.5 

3.06 

2.73 

2.42 

dirty 

1.28 

2.3 

1.94 

1.94 

3.7 

easy 

3.92 

1.11 

1.15 

1.19 

1.09 

easygoing 

3.98 

1.14 

1.14 

1.14 

1.11 

ecstatic 

4.08 

1.34 

1.31 

1.8 

1.52 

elated 

3.93 

1.21 

1.19 

1.17 

1.12 

famous 

3.32 

1.3 

1.21 

1.2 

1.38 

fantastic 

4.07 

1.19 

1.31 

1.25 

1.22 

greedy 

1.41 

3.14 

2.68 

2.27 

2.94 

hard 

1.65 

2.22 

1.75 

2.21 

1.4 

innocent 

3.17 

1.37 

1.49 

1.66 

1.27 

lazy 

1.49 

2.01 

1.83 

1.4 

2.39 

menacing 

1.17 

2.94 

1.78 

1.97 

2.18 

merry 

4.38 

1.07 

1.14 

1.08 

1.08 

noisy 

1.39 

2.97 

1.39 

1.41 

1.45 

nonchalant 

1.85 

1.4 

1.31 

1.26 

1.47 

protected 

4.11 

1.24 

1.33 

1.47 

1.08 

proud 

3.18 

1.55 

1.29 

1.58 

1.26 

quartan 

1.39 

1.18 

1.17 

1.17 

1.15 

rejected 

1.05 

3.5 

3.91 

3.47 

2 


relaxed 

4.32 

1.12 

1.14 

1.1 

1.04 

scared 

1.14 

2.41 

3.02 

4.09 

1.83 

scornful 

1.16 

3.31 

2.13 

2.17 

1.74 

serious 

1.45 

1.92 

1.84 

1.97 

1.29 


Further, we analyze six supervised learning algorithms 
namely. Support Vector Machine, Decision Trees, Logistic 
Regression, Multi-layer Perceptron, Random Forest, K- 
Nearest Neighbors for predicting the adjective emotion 
values for each tweet. The details about these techniques are 
given in the table 3 below: 


Table 3. Supervised Learning Techniques 


Technique 

Description 

Logistic 

Regression (LR) 

One of the most basic classification techniques, 
logistic regression utilizes a logistic function, also 
known as sigmoid function. It associates each 
input value with a coefficient (O), and trains the 
given system to adapt to expected output value by 
modifying these 0 values 

K-Nearest 
Neighbours (K- 
NN) 

K-NN is a classification algorithm that is based 
on feature similarity; that is, it focuses on 
similarities between values in a class. It treats 
input values as vectors in a feature space, and is 
based on votes given by its k nearest neighbors. 
K-NN is a lazy learning algorithm; it doesn’t 
generalize through available data, but instead 
represents the data as it is. 

Support Vector 
Machines (SVM) 

SVM represents the dataset as a map in such a 
way that there’s a clearly defined gap between the 
classes. Its approach depends on number of 
classes and the representation of its mapping. 

Decision Tree 
(DT) 

A DT symbolizes a set of rules, which help us to 
determine the class an input belongs to. The 
decision making process of these trees starts from 
the root, traverses downwards and ends up at 
leaves. The leaf nodes of a decision tree represent 
the values of an attribute. The other nodes are 
called decision nodes, which test given values and 
determine factors that help us classify them as we 
go downwards. Its training involves selecting the 
appropriate attribute to split the tree at each stage, 
while keeping the tree compact and organized. 

Random Forests 
(RF) 

RF Algorithm overcomes the limitations of DT 
method, by creating a forest of trees. The higher 
the number of trees, the greater is the accuracy of 
the system. It selects random subsets of the 
training input with replacement and fits decision 
trees in accordance with those samples, also called 
Bagging. This technique decreases the variance of 
the model, by averaging it out across many trees 
thereby cancelling noise and giving it the ability to 
generalize again. 

Multi-layer 

Perceptron 

(MLP) 

MLP is a type of feed-forward artificial neural 
network which uses back propagation as a 
supervised learning technique. MLP can adjust 
themselves to the data without any explicit 
specification of functional or distributional form 
for the underlying model. 


Thus, the adjectives are analyzed and classified for five pre¬ 
defined emotion categories namely Happiness, Anger, 
Sadness, Fear and Disgust. The classification results are 
evaluated based on precision, recall, accuracy and F-score as 
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the performance measures. We discuss the results in section 
5. 

Out of the five emotion categories considered for this work, 
happiness is the only emotion which has a positive polarity 
whereas the other four, namely, anger, sadness, fear and 
disgust have negative polarity. The natural language words 
conveying anger, sadness, fear and disgust are often related 
to anxiety and depression in humans. These are the "trigger” 
emotions which drive people to take action which makes it 
more likely to pass things as a chain reaction. Thus, to 
identify the category of emotions we determine the polarity 
(positive or negative) of the verbs. An emotion polarity 
lexicon base for 100 most commonly used verbs is created 
and the polarity values are assigned within the range of +1 to 
-1. Further the strength of this polarity is assessed using an 
adverb emotion polarity strength lexicon base created for this 
research. The respective emotion polarities & strengths 
within both the lexicon-base have been congregated through 
a crowd-sourcing task. The polarity strength value and 
emotion polarity for few adverbs and verbs is shown in table 
4 and table 5 respectively. 


Table 4. Adverb Emotion Polarity Strength 


Adverb 

Emotion Polarity Strength 

Extremely 

+1 

Terribly 

0.9 

Seriously 

0.8 

Totally 

0.7 

Completely 

0.6 

Most 

0.5 

Too 

0.4 

Very 

0.4 

Highly 

0.4 

Pretty 

0.3 

More 

0.2 

Much 

0.1 

Any 

-0.1 

Quite 

-0.2 

Just 

-0.3 

Little 

-0.4 

Dimly 

-0.5 

Less 

-0.6 

Not 

-0.8 

Never 

-0.9 

Hardly 

-1 


Table 5. Verb Emotion Polarity 


Verb 

Emotion Polarity 

Love 

1 

Adore 

0.9 

Won 

0.9 

Like 

0.8 

Enjoy 

0.7 

Kiss 

0.7 

Smile 

0.6 

Impress 

0.5 

Attract 

0.4 

Excite 

0.3 

Relax 

0.2 

Kill 

-1 

Shoot 

-1 

Revenge 

-1 

Hate 

-1 

Destruct 

-0.9 

Harm 

-0.9 

Hurt 

-0.8 

Fight 

-0.8 

Beat 

-0.7 

Hit 

-0.7 

Yell 

-0.6 

Lost 

-0.5 

End 

-0.4 

Detest 

-0.2 

Reject 

-0.1 


The seed lists of positive and negative adverbs and verbs 
whose orientation we know is created and then grown using 
the WordNet [11], That is, for each Adverb and Verb 
occurring in a tweet, it is checked for its presence in the seed 
list. If it is a hit, the values are assigned and returned else in 
case of a miss, WordNet is used to extract synonym and 
antonym with known value and assigned the value 
accordingly. 


C. Scoring Module 

Once the emotion value from all indicators is extracted, the 
next step is to gauge the emotion quotient of the tweet for 
subsequently calculating the viral value of a tweet and 
virality of a topic. 

7 Q _ 1 / £t=ll E ari)l, ij =1 \ E avbi\ + £? =1 l E emotjl 

a+b+c+d \ n *5 m p q 


where, E ad j .,E vb ., E avb . , E emot . are the emotion values of 
adjective, verb, adverb and emoticon respectively. As these 
quantify the strength of the emotion, we take the mod of 
values; 

n, m, p and q are the number of adjectives, verbs, adverbs 
and emoticons present in the tweet; 
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The parameters a, b, c and d are used to signify the presence 
of the emotion indicators. For example, if an adjective is 
absent, the value of will be 0 and if it’s present the value of a 
will be 1. This has been done to dampen the values of 
emotion quotient such that they are normalized within the 
range of 0 to 1. The value of the parameters is assessed as 
shown in table 6 below: 


Table 6: Parameter Values 


Parameter 

Value =0 

Value=l 

a 

n=0 

n>0 

b 

m=0 

m>0 

c 

P=0 

p>0 

d 

q=0 

q>0 


Next, based on the emotion quotient of a tweet, the viral 
value of the tweet (VV tweet ) is calculated using the following 
equation (2) 

W tweet = Polarity [^] (2) 

Where, the Polarity is in terms of positive or negative 
sentiment (indicated by + or -). It determines the emotional 
factor of the post. Out of the five emotions considered, fear, 
anger, sadness and disgust are negative emotions whereas 
happiness is a positive one. But in the absence of an adjective 
in tweet, this polarity classification is not possible. So, we 
propose that, as the adverbs qualify adjectives and verbs, the 
adjective group (adjective*adverb) or the verb group 
(verb*adverb) polarities will determine the overall polarity of 
a particular tweet. This is imperative in determining the 
emotional orientation of the posts as the strength of same 
emotion type will be a yardstick of virality. 

EQ is the emotional quotient of the tweet calculated using 
equation 1; 

R is the no. of re-tweets, that is, the total no. of times the 
tweet has been reposted; 

T is the time, that is, the life span of the tweet counted in 
number of days. 

In most of the models, the volume of re-tweets is the key 
indicator of virality but this will yield any topic with more 
re-tweets to have a high viral value even if it is a post from 
the past. The rationale is that a tweet with more than 5000 re¬ 
tweets in a single day has more viral value than the same 
5000 re-tweets in 7 days. Moreover, social media platforms 
like YouTube define viral videos as videos with more than 5 
million views in a span of 1 week but no such virality 
benchmarking has been done for twitter posts. So the main 
aim is to detect the virality of a post in the present so that 
steps to mitigate the risk of wrongful information from being 
spread can be taken promptly. 


A transaction file is maintained for each tweet on the topic 
storing the emotion quotient, its polarity, no. of re-tweets, 
life-span and the viral value of the tweet. Thus the virality 
score for information, V inf0 , is the cumulative emotion 
quotient calculated using the following formulae in equation 

(3) 

V (info) = £i=i EQi for i, ., t tweets £ #topic (3) 

As discussed earlier, out of the five emotions considered, 
fear, anger, sadness and disgust are negative emotions 
whereas happiness is a positive one. A cumulative negative 
viral score is indicative of a similar sense of outrage among 
the members of the virtual community. Thus, the strength of 
same emotion across posts is the yardstick of virality. That 
is, the information further needs to be checked for veracity 
and origin to restrict flare-up of rumour. 

The implementation details and a sample calculation are 
illustrated in the next section. 

IV. ILLUSTRATION 

Basically, the work carried out encompasses the following: 

• Feature Engineering 

• Implementation of six supervised learning 

techniques to empirically analyze a better classifier 
for adjective emotion value detection 

• Quantifying the emotional value of tweet and 
cumulative emotional value across tweets for a 
topic. 

• Virality Scoring 

To clearly illustrate the effectiveness of the proposed 
method, a case study is presented with a sample set of tweets. 

Sample Tweet: Let us consider a sample tweet on trending 
topic #Texasshootout which has 870 re-tweets in 1 day and 
compute its emotion quotient (EQ), Polarity and Viral Value 
(W tweet ) 


After the brutal shootout in school, children 
harmed...bombs to kill more! I am scared :’( 
#Texasshootout #lifeunderthreat 


A. Pre-processing of Tweets 

After downloading tweets using the #topic, the data is 
cleaned by removing hashtags, usernames, hyperlinks, RT 
symbol, punctuations and non-English characters. The 
emoticons are transformed to the description as defined in 
table 1. Stemming and tokenization is also performed for pre- 
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processing the tweets. Stemming is done on text in order to 
preserve the root of the word, for example it reduces harming 
to its root word i.e. harm. 


• The emotion polarity for the verbs, “shoot”, “kill” 
and “harm” are assigned as -1, -1. -0.9 respectively 


After the brutal shoot in school children harm bomb to 
kill more I am scare cry 

B. POS Tagging 

Subsequent to the pre-processing, only the adjectives, 
adverbs and verbs are extracted from the feature set. Each 
tweet is parsed using CMU Twitter POS tagger. The 
resultant file is a list of tweets that only have adjectives, 
verbs and adverbs (in the original order), which are referred 
to as emotion indicators. 


brutal 

shoot harm 

kill more 

ADJECTIVE 

VERB VERB 

VERB ADVERB 

scare 

cry 


ADJECTIVE 

EMOTICON 



• In the list of adverbs we get the emotion polarity 
strength values of “more” as 0.2 (from the table 5) 

• The polarity value of cry from emoticon table 1 is -1 

• Now using equation (1), the EQ of the tweet will be 
computed as follows: 


EQ = 


Ebrutal rC SC are 


2*5 


+ 


^shoot^^kilP^harm 


+ 


+ 


-cry 
1 


If 3.65+4.09 ' (—1)+(—1)+(—0.9) ' 0.2 ' 

4(l0 3 L 1 J 



C. Emotion Scoring 

Once the POS tagging is done, the words are scored using 
the crowd-sourced lexicon values. The above parsed tweet is 
thus scored as follows: 

• Here we can see that “brutal” & “scare” are 
adjectives, “shoot”, “kill”, “harm” are verbs, “more” 
is an adverb and “cry” is the description of 
emoticon. 


= J{[0.774] + [“] + [0.2] + [1]} = 0.2 5{[0.774] + 
[0.322] + [0.2] + [1]} 

= 0.25 X 2.296 = 0.574 

Thus, the EQ of the Tweet is 0.574 and the polarity from 
classifier is negative, -1. 


• The adjective emotion values of “brutal” and 
“scare” are represented by the vectors [1.16, 3.65, 
2.99, 3.28, 2.86] and [1.14, 3.31, 2.13, 4.09, 1.83] 
respectively such that the values in vector are 
representative of [<Happiness>, <Anger>, 
<Sadness>, <Fear>, <Disgust>] as shown in table 2 


Now using equation (2), the VV tweet is computed as 
follows 


W u 


= (-l) 


0.574 * 870 


-499.38 


• Classifier detects the emotion polarity of adjectives 
as Anger for “brutal” and Fear for “scare”, which 
are both negative emotions, giving a polarity of -1 
to the tweet 


• Similarly we calculate the values for the other 
tweets on the same topic as shown in the following 
table 7: 


Table 7. Illustration of Scoring Module 


Original Tweet 

Features 

Emotion 

Quotientxweei 

Polarity 

Re-tweet 

Life¬ 

span 

Viral 

Valuexweet 

Virality 

(Info) 

This is pretty 

serious...We will all 
be killed </3 :’( 

#texasshootout 

#scared 

Pretty(Adv) 
serious (Adj) all 
(Adv) kill (Vb) 
broken heart cry 
(Emoti) 

0.6485 

-1 

1105 

i 

-716.59 
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More kills! They are 
terrorist! School 

children hurt X-( =0 
#texasshootout 
#godhelp 

More (Adv) kill 
(Vb) hurt (Vb) 
angry shocked 
(Emoti) 

0.6166 

-1 

700 

1 

-431.62 

+186.90- 
5713.52 = - 

5526.62 

Innocent people & 
children killed. Are 
they humans? 

Terrible it is Xp X-( 

:-o 

#texasshootout #rip 
#inhuman 

Innocent (Adj) 
kill (Vb) hate 
(Vb) terrible 

(Adv) disgusted 
angry shocked 
(Emoti) 

0.842 

-1 

1402 

1 

-1180.48 

Bravo! Great 

work... the school 
for rich people! :-D 
:P #texas shootout 

#wedeserveit 

Great (Adj) 

work (Vb) rich 
(Adj) big 

grin( Emoti) 

0.756 

+ 1 

247 

1 

+186.90 

Bombs to kill 

planted! Highways 
closed as extreme 
violence reported. 
Scared to death ;( 
=0 

#texasshootout 

#disturbed 

kill (Vb) plant 
(Vb) close (Vb) 
extreme (Adv) 
scare(Adj) cry 
shocked (Emoti) 

0.767 

-1 

3762 

1 

-2885.45 


Using equation (3), the virality of the topic is - 


supervised learning techniques [12, 13 


The following table 


describes the results of the adjective emotion classifier: 


5526.62 and it is observed that the dominant 
emotions are similar in tone for a viral topic. Based 
on further experimentation, the threshold for a topic 
being called “viral” has been set to 5000. So any 
value of virality greater than 5000 implies that the 
topic has a cascading effect and steps to authenticate 
its accuracy and origin must be taken by agencies 
(business or government). 

• The + and - simply indicate the polarity of the post. 


Thus, using the proposed virality framework the likelihood 
of content going viral can be determined and this can be an 
initial step to identify and highlight information with 
questionable veracity. The next section discusses the results 
obtained . 


Table 8. Performance Results 


Measures A 

A 

P 

R 

F 

Techniques 

K-NN 

70.6 

0.71 

0.71 

0.71 

SVM 

85.6 

0.85 

0.86 

0.86 

DT 

84.8 

0.85 

0.85 

0.85 

RF 

89.1 

0.89 

0.89 

0.89 

MLP (NN) 

91.2 

0.90 

0.92 

0.91 

LR 

92.0 

0.91 

0.92 

0.92 


V. Results and Discussion 

This section highlights the results and observations related to 
performance of the proposed framework. The empirical 
analysis results demonstrate that the virality model 
effectively finds the viral information. The preliminary 
results are clearly motivating. 

The findings were further analyzed for the adjective emotion 
classifier performance using the measures: Accuracy (A), 
Precision (P), Recall (R) and F-score for the various 


It is observed that Logistic Regression and Neural Networks 
give the highest accuracy scores (92% and 91% 
respectively). As the data was crisp and concise, high values 
for all four metrics were observed. Next to it are RF and 
SVM depicting 89% and 86% accuracy. DT came next with 
a comparable accuracy of 85%. K-NN showed the lowest 
accuracy of around 71%. It is interesting to note that using 
Ensemble methods such as Random Forests gave improved 
and enhanced results in comparison to the traditional single 
Decision Trees model. 

The following Figures 3, 4, 5 and 6 depict the results shown 
in table with the help of graphs. 
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Techniques 


Figure 3. Accuracy 



o 

K-NN SVM DT RF NN LR 

Techniques 


Figure 6. F-Measure 


NN ■■■■■■■■ 

RF 

5 

dt 

H 

svm 

K-NN 

0 0.2 0.4 0.6 0.8 1 

Precision 

Figure 4. Precision 


. nn 

<u 

'S 

dt 

QJ 

svm 

K-NN 

0 0.2 0.4 0.6 0.8 1 

Recall 


Finally, the for the following socio-political topics: 
#texasshootout; #MeToo; #TakeAKnee; #Covfefe; 
#YemenInquiryNow; #308Removed; #plastickills; 
#KarnatakaElections2018; the framework is able to 
determine the virality value of the topical information along 
with its sentiment polarity and fine-grain emotion value. 

VI. Conclusion and Future Scope 

Sharing online content is an indispensable part of our 
contemporary lives. Consequently, it becomes exceedingly 
imperative to resolve the authenticity of information and 
promptly inhibit them from spreading among the Internet 
users as it can jeopardize the well-being of the citizens. The 
proposed virality framework determines the likelihood of 
content going viral based on the strength of similar emotion 
across the tweets on a topic. The hybrid approach makes use 
of natural language textual cues of emotions from parts-of- 
speech like adjectives, verbs, adverbs and emoticons. The 
empirical evaluation of supervised learning techniques used 
for emotion classification of adjectives yields the best results 
for logistic regression followed by the neural network (multi¬ 
layer perceptron). The virality of social and political topics is 
perceived accurately using the scoring module. As a future 
direction of work, the fluctuations in emotions can be 
captured as they convey uncertainty towards a topic and may 
assist in veracity check or rumour stance detection. The 
framework can be used to draw a correlation of virality to 
rumour in order to acquire a list of potential rumours, for 
which the truth value needs to be determined. Also, 
contextual information within the post can be assessed for 
virality prediction. 


Figure 5. Recall 
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